基于物理学的数值模型代表了地球系统建模中的最先进,包括我们的最佳工具,用于产生洞察和预测。尽管计算能力快速增长,但对更高模型分辨率的感知需求压倒了最新一代电脑,降低了建模者为理解参数敏感性和表征变异性和不确定性而产生模拟的能力。因此,通常开发了代理模型以捕获全吹制数值的基本属性。最近的机器学习方法的成功,尤其是深度学习,跨越许多学科提供了复杂的非线性连接者表示可能能够捕获地球系统中的底层复杂结构和非线性过程的可能性。基于深度学习的仿真的难度测试,这是指数值模型的近似,是为了了解它们是否可以在计算效率方面与传统形式的代理模型相当,同时再现模型以可靠的方式再现模型。可以预期通过该测试的深度学习仿真,而不是捕获复杂进程和时空依赖性的简单模型来表现更好。在这里,我们检查了基于卫星的遥感的案例研究,深度学习方法可以可靠地代表来自代理模型的模拟,具有可比的计算效率。我们的结果令人鼓舞的是,深度学习仿真以可接受的准确性再现结果,并且往往更快的性能。我们阐明了我们对深度学习的高性能实现的改进步伐的更广泛的影响以及地球科学中更高分辨率模拟的渴望。
translated by 谷歌翻译
Property inference attacks against machine learning (ML) models aim to infer properties of the training data that are unrelated to the primary task of the model, and have so far been formulated as binary decision problems, i.e., whether or not the training data have a certain property. However, in industrial and healthcare applications, the proportion of labels in the training data is quite often also considered sensitive information. In this paper we introduce a new type of property inference attack that unlike binary decision problems in literature, aim at inferring the class label distribution of the training data from parameters of ML classifier models. We propose a method based on \emph{shadow training} and a \emph{meta-classifier} trained on the parameters of the shadow classifiers augmented with the accuracy of the classifiers on auxiliary data. We evaluate the proposed approach for ML classifiers with fully connected neural network architectures. We find that the proposed \emph{meta-classifier} attack provides a maximum relative improvement of $52\%$ over state of the art.
translated by 谷歌翻译
胃肠道癌被认为是胃肠道中器官的致命恶性状况。由于其死亡,迫切需要医学图像分割技术来分割器官以减少治疗时间并增强治疗。传统的分割技术取决于手工制作的功能,并且在计算上昂贵且效率低下。视觉变压器在许多图像分类和细分任务中都获得了巨大的知名度。为了从变形金刚的角度解决这个问题,我们引入了混合CNN-Transformer架构,以从图像分割不同的器官。所提出的解决方案具有健壮,可扩展性和计算有效的效率,骰子和JACCARD系数分别为0.79和0.72。拟议的解决方案还描述了基于深度学习的自动化的本质,以提高治疗的有效性
translated by 谷歌翻译
在这项工作中,我们介绍了我们提出的方法,该方法是使用SWIN UNETR和基于U-NET的深神经网络体系结构从CT扫描中分割肺动脉的方法。六个型号,基于SWIN UNETR的三个型号以及基于3D U-NET的三个模型,使用加权平均值来制作最终的分割掩码。我们的团队通过这种方法获得了84.36%的多级骰子得分。我们的工作代码可在以下链接上提供:https://github.com/akansh12/parse2022。这项工作是Miccai Parse 2022挑战的一部分。
translated by 谷歌翻译
语义相似性分析和建模是当今自然语言处理的许多开创性应用中的根本上广受好评的任务。由于顺序模式识别的感觉,许多神经网络(例如RNN和LSTMS)在语义相似性建模中都取得了令人满意的结果。但是,这些解决方案由于无法以非顺序方式处理信息而被认为效率低下,从而导致上下文提取不当。变形金刚由于其优势(例如非序列数据处理和自我注意力)而成为最先进的体系结构。在本文中,我们使用传统和基于变压器的技术对美国专利短语进行语义相似性分析和建模,以匹配数据集。我们在解码增强的BERT-DEBERTA的四个不同变体中进行实验,并通过执行K折交叉验证来增强其性能。实验结果表明,与传统技术相比,我们的方法学的性能提高,平均Pearson相关评分为0.79。
translated by 谷歌翻译
We focus on the audio-visual video parsing (AVVP) problem that involves detecting audio and visual event labels with temporal boundaries. The task is especially challenging since it is weakly supervised with only event labels available as a bag of labels for each video. An existing state-of-the-art model for AVVP uses a hybrid attention network (HAN) to generate cross-modal features for both audio and visual modalities, and an attentive pooling module that aggregates predicted audio and visual segment-level event probabilities to yield video-level event probabilities. We provide a detailed analysis of modality bias in the existing HAN architecture, where a modality is completely ignored during prediction. We also propose a variant of feature aggregation in HAN that leads to an absolute gain in F-scores of about 2% and 1.6% for visual and audio-visual events at both segment-level and event-level, in comparison to the existing HAN model.
translated by 谷歌翻译
We propose a technique for producing 'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable.Our approach -Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say 'dog' in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fullyconnected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multimodal inputs (e.g. visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative vi-
translated by 谷歌翻译
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.
translated by 谷歌翻译
Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric (CIDEr) that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.
translated by 谷歌翻译